Learning and Selecting the Right Customers for Reliability: A Multi-armed Bandit Approach

نویسندگان

  • Yingying Li
  • Qinran Hu
  • Na Li
چکیده

In this paper, we consider residential demandresponse (DR) programs where an aggregator calls upon someresidential customers to change their demand so that the totalload adjustment is as close to a target value as possible.Major challenges lie in the uncertainty and randomness of thecustomer behaviors in response to DR signals, and the limitedknowledge available to the aggregator of the customers. To learnand select the right customers, we formulate the DR problemas a combinatorial multi-armed bandit (CMAB) problem witha reliability goal. We propose a learning algorithm: CUCB-Avg (Combinatorial Upper Confidence Bound-Average), whichutilizes both upper confidence bounds and sample averagesto balance the tradeoff between exploration (learning) andexploitation (selecting). We prove that CUCB-Avg achievesO(log T ) regret given a time-invariant target, and o(T ) regretwhen the target is time-varying. Simulation results demonstratethat our CUCB-Avg performs significantly better than theclassic algorithm CUCB (Combinatorial Upper ConfidenceBound) in both time-invariant and time-varying scenarios.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bayesian Bandit Approach to Personalized Online Coupon Recommendations

A digital coupon distributing firm selects coupons from its coupon pool and posts them online for its customers to activate them. Its objective is to maximize the total number of clicks that activate the coupons by sequential arriving customers. This paper resolves this problem by using a multi-armed bandit approach to balance the exploration (learning customers' preference for coupons) with ex...

متن کامل

A quality assuring multi-armed bandit crowdsourcing mechanism with incentive compatible learning

We develop a novel multi-armed bandit (MAB) mechanism for the problem of selecting a subset of crowd workers to achieve an assured accuracy for each binary labelling task in a cost optimal way. This problem is challenging because workers have unknown qualities and strategic costs.

متن کامل

Budgeted Learning, Part I: The Multi-Armed Bandit Case

We introduce and motivate the task of learning under a budget. We focus on a basic problem in this space: selecting the optimal bandit after a period of experimentation in a multi-armed bandit setting, where each experiment is costly, our total costs cannot exceed a fixed pre-specified budget, and there is no reward collection during the learning period. We address the computational complexity ...

متن کامل

An Optimal Online Method of Selecting Source Policies for Reinforcement Learning

Transfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences. The problem of optimally selecting source policies during the learning process is of great importance yet challenging. There has been little theoretical analysis of this problem. In this paper, we develop an optimal online method to select source policies fo...

متن کامل

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly improve the learning performance, most of the literature on transfer is focused on batch learning tasks. In this paper we study the problem of sequential transfer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018